منابع مشابه
Preconditioned Stochastic Gradient Descent
Stochastic gradient descent (SGD) still is the workhorse for many practical problems. However, it converges slow, and can be difficult to tune. It is possible to precondition SGD to accelerate its convergence remarkably. But many attempts in this direction either aim at solving specialized problems, or result in significantly more complicated methods than SGD. This paper proposes a new method t...
متن کاملOn the Performance of Preconditioned Stochastic Gradient Descent
This paper studies the performance of preconditioned stochastic gradient descent (PSGD), which can be regarded as an enhance stochastic Newton method with the ability to handle gradient noise and non-convexity at the same time. We have improved the implementation of PSGD, unrevealed its relationship to equilibrated stochastic gradient descent (ESGD) and feature normalization, and provided a sof...
متن کاملPreconditioned Stochastic Gradient Descent Optimisation for Monomodal Image Registration
We present a stochastic optimisation method for intensity-based monomodal image registration. The method is based on a Robbins-Monro stochastic gradient descent method with adaptive step size estimation, and adds a preconditioning matrix. The derivation of the pre-conditioner is based on the observation that, after registration, the deformed moving image should approximately equal the fixed ima...
متن کاملRecurrent neural network training with preconditioned stochastic gradient descent
Recurrent neural networks (RNN), especially the ones requiring extremely long term memories, are difficult to training. Hence, they provide an ideal testbed for benchmarking the performance of optimization algorithms. This paper reports test results of a recently proposed preconditioned stochastic gradient descent (PSGD) algorithm on RNN training. We find that PSGD may outperform Hessian-free o...
متن کاملVariational Stochastic Gradient Descent
In Bayesian approach to probabilistic modeling of data we select a model for probabilities of data that depends on a continuous vector of parameters. For a given data set Bayesian theorem gives a probability distribution of the model parameters. Then the inference of outcomes and probabilities of new data could be found by averaging over the parameter distribution of the model, which is an intr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Neural Networks and Learning Systems
سال: 2018
ISSN: 2162-237X,2162-2388
DOI: 10.1109/tnnls.2017.2672978